Code Chunk
pacman::p_load(jsonlite, tidygraph, ggraph, kableExtra,
visNetwork, graphlayouts, ggforce, textstem,
skimr, tidytext, tidyverse, gganimate,dplyr, lubridate, DT)Joshua TING
May 18, 2024
June 20, 2024
Setting the Scene
The business community in Oceanus is dynamic with new startups, mergers, acquisitions, and investments. FishEye International closely watches business records to keep tabs on commercial fishing operators. FishEye’s goal is to identify and prevent illegal fishing in the region’s sensitive marine ecosystem. Analysts are working with company records that show ownership, shareholders, transactions, and information about the typical products and services of each entity. FishEye’s analysts have a hybrid automated/manual process to transform the data into CatchNet: the Oceanus Knowledge Graph.
In the past year, Oceanus’s commercial fishing business community was rocked by the news that SouthSeafood Express Corp was caught fishing illegally. FishEye wants to understand temporal patterns and infer what may be happening in Oceanus’s fishing marketplace because of SouthSeafood Express Corp’s illegal behavior and eventual closure. The competitive nature of Oceanus’s fishing market may cause some businesses to react aggressively to capture SouthSeafood Express Corp’s business while other reactions may come from the awareness that illegal fishing does not go undetected and unpunished.
Question 1: FishEye analysts want to better visualize changes in corporate structures over time. Create a visual analytics approach that analysts can use to highlight temporal patterns and changes in corporate structures. Examine the most active people and businesses using visual analytics.
Question 4: Identify the network associated with SouthSeafood Express Corp and visualize how this network and competing businesses change as a result of their illegal fishing behavior. Which companies benefited from SouthSeafood Express Corp legal troubles? Are there other suspicious transactions that may be related to illegal fishing? Provide visual evidence for your conclusions.
In the code chunk below, fromJSON() of jsonlite package is used to import MC3.json into R environment.
The code chunk below will be used to extract the links data.frame of mc3_data and save it as a tibble data.frame called mc3_edges.
mc3_edges <- as_tibble(mc3_data$links) %>%
unnest(source) %>%
distinct() %>%
mutate(source = as.character(source),
target = as.character(target),
type = as.character(type),
startdate = as_datetime(start_date)) %>%
group_by(source, target, type, startdate) %>%
summarise(weights = n()) %>%
filter(source != target) %>%
ungroup()
head(mc3_edges)# A tibble: 6 × 5
source target type startdate weights
<chr> <chr> <chr> <dttm> <int>
1 4. SeaCargo Ges.m.b.H. Dry CreekRybachit Ma… Even… 2034-12-31 00:00:00 1
2 4. SeaCargo Ges.m.b.H. KambalaSea Freight I… Even… 2033-04-12 00:00:00 1
3 9. RiverLine CJSC SumacAmerica Transpo… Even… 2028-12-02 00:00:00 1
4 Aaron Acosta Manning-Pratt Even… 2008-09-14 00:00:00 1
5 Aaron Acosta Manning-Pratt Even… 2008-07-30 00:00:00 1
6 Aaron Allen Hicks-Calderon Even… 2025-03-06 00:00:00 1
Splitting Words
[1] "tbl_df" "tbl" "data.frame"
Learning from Code Chunk Above
distinct() is used to ensure that there will be no duplicated records.
mutate() and as.character() are used to convert the field data type from list to character.
group_by() and summarise() are used to count the number of unique links.
the filter(source!=target) is to ensure that no record with similar source and target.
The code chunk below will be used to extract the nodes data.frame of mc3_data and save it as a tibble data.frame called mc3_nodes.
# extract all nodes from graph
mc3_nodes <- as_tibble(mc3_data$nodes) %>%
mutate(country = as.character(country),
id = as.character(id),
revenue = as.numeric(as.character(revenue)),
type = as.character(type)) %>%
select(id, country, type, revenue)
# extract all nodes from edges
id1 <- mc3_edges %>%
select(source, type) %>%
rename(id = source) %>%
mutate(country = NA, revenue = NA) %>%
select(id, country, type, revenue)
id2 <- mc3_edges %>%
select(target, type) %>%
rename(id = target) %>%
mutate(country = NA, revenue = NA) %>%
select(id, country, type, revenue)
additional_nodes <- rbind(id1, id2) %>%
distinct %>%
filter(!id %in% mc3_nodes[["id"]])
# combine all nodes
mc3_nodes_updated <- rbind(mc3_nodes, additional_nodes) %>%
distinct()
head(mc3_nodes_updated)# A tibble: 6 × 4
id country type revenue
<chr> <chr> <chr> <dbl>
1 Abbott, Mcbride and Edwards Uziland Entity.Organization.Company 5995.
2 Abbott-Gomez Mawalara Entity.Organization.Company 71767.
3 Abbott-Harrison Uzifrica Entity.Organization.Company 0
4 Abbott-Ibarra Islavaragon Entity.Organization.Company 0
5 Abbott-Sullivan Oceanus Entity.Organization.Company 4747.
6 Acevedo and Sons Imazam Entity.Organization.Company 46567.
Learning From Code Chunk Above
mutate() and as.character() are used to convert the field data type from list to character.
To convert revenue_omu from list data type to numeric data type, we need to convert the values into character first by using as.character(). Then, as.numeric() will be used to convert them into numeric data type.
select() is used to re-organise the order of the fields.
In the code chunk below, skim() of skimr package is used to display the summary statistics of mc3_edges tibble data frame.
| Name | mc3_edges |
| Number of rows | 75815 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| numeric | 1 |
| POSIXct | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| source | 0 | 1.0 | 6 | 42 | 0 | 51996 | 0 |
| target | 0 | 1.0 | 6 | 48 | 0 | 8926 | 0 |
| type | 0 | 1.0 | 14 | 31 | 0 | 4 | 0 |
| event2 | 0 | 1.0 | 4 | 18 | 0 | 3 | 0 |
| event3 | 14908 | 0.8 | 15 | 19 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| weights | 0 | 1 | 1 | 0.01 | 1 | 1 | 1 | 1 | 2 | ▇▁▁▁▁ |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| startdate | 90 | 1 | 1952-05-31 | 2035-12-29 | 2023-12-12 | 11468 |
In the code chunk below, datatable() of DT package is used to display mc3_edges tibble data frame as an interactive table on the html document.
In the above graph, it shows the counts of people that either is a beneficial owner, shareholder, employee or family member of the organisation. A substantial of the graph belongs to the shareholders.
5.1 Tidying Nodes
set.seed(123)
mc3_graph %>%
filter(betweenness_centrality >= 1000000) %>%
ggraph(layout = "fr") +
geom_edge_link(aes(#width= weights,
alpha=0.5)) +
geom_node_point(aes(
size = betweenness_centrality,
color = type,
alpha = 0.3)) +
geom_node_label(aes(label = id),repel=TRUE, size=2.5, alpha = 0.8) +
scale_size_continuous(range=c(1,10)) +
theme_graph() +
labs(title = 'Initial network visualisation',
subtitle = 'Entities with betweenness scores > 1,000,000')